974 research outputs found

    Some thoughts about Gaussian Processes

    No full text

    Implicit surfaces with globally regularised and compactly supported basis functions

    Get PDF
    We consider the problem of constructing a function whose zero set is to represent a surface, given sample points with surface normal vectors. The contributions include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable properties previously only associated with fully supported bases, and show equivalence to a Gaussian process with modified covariance function. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data

    Sensitive and Scalable Online Evaluation with Theoretical Guarantees

    Full text link
    Multileaved comparison methods generalize interleaved comparison methods to provide a scalable approach for comparing ranking systems based on regular user interactions. Such methods enable the increasingly rapid research and development of search engines. However, existing multileaved comparison methods that provide reliable outcomes do so by degrading the user experience during evaluation. Conversely, current multileaved comparison methods that maintain the user experience cannot guarantee correctness. Our contribution is two-fold. First, we propose a theoretical framework for systematically comparing multileaved comparison methods using the notions of considerateness, which concerns maintaining the user experience, and fidelity, which concerns reliable correct outcomes. Second, we introduce a novel multileaved comparison method, Pairwise Preference Multileaving (PPM), that performs comparisons based on document-pair preferences, and prove that it is considerate and has fidelity. We show empirically that, compared to previous multileaved comparison methods, PPM is more sensitive to user preferences and scalable with the number of rankers being compared.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information and Knowledge Managemen

    Local Algorithms for Block Models with Side Information

    Full text link
    There has been a recent interest in understanding the power of local algorithms for optimization and inference problems on sparse graphs. Gamarnik and Sudan (2014) showed that local algorithms are weaker than global algorithms for finding large independent sets in sparse random regular graphs. Montanari (2015) showed that local algorithms are suboptimal for finding a community with high connectivity in the sparse Erd\H{o}s-R\'enyi random graphs. For the symmetric planted partition problem (also named community detection for the block models) on sparse graphs, a simple observation is that local algorithms cannot have non-trivial performance. In this work we consider the effect of side information on local algorithms for community detection under the binary symmetric stochastic block model. In the block model with side information each of the nn vertices is labeled ++ or - independently and uniformly at random; each pair of vertices is connected independently with probability a/na/n if both of them have the same label or b/nb/n otherwise. The goal is to estimate the underlying vertex labeling given 1) the graph structure and 2) side information in the form of a vertex labeling positively correlated with the true one. Assuming that the ratio between in and out degree a/ba/b is Θ(1)\Theta(1) and the average degree (a+b)/2=no(1) (a+b) / 2 = n^{o(1)}, we characterize three different regimes under which a local algorithm, namely, belief propagation run on the local neighborhoods, maximizes the expected fraction of vertices labeled correctly. Thus, in contrast to the case of symmetric block models without side information, we show that local algorithms can achieve optimal performance for the block model with side information.Comment: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF fil

    Backbone of complex networks of corporations: The flow of control

    Full text link
    We present a methodology to extract the backbone of complex networks based on the weight and direction of links, as well as on nontopological properties of nodes. We show how the methodology can be applied in general to networks in which mass or energy is flowing along the links. In particular, the procedure enables us to address important questions in economics, namely, how control and wealth are structured and concentrated across national markets. We report on the first cross-country investigation of ownership networks, focusing on the stock markets of 48 countries around the world. On the one hand, our analysis confirms results expected on the basis of the literature on corporate control, namely, that in Anglo-Saxon countries control tends to be dispersed among numerous shareholders. On the other hand, it also reveals that in the same countries, control is found to be highly concentrated at the global level, namely, lying in the hands of very few important shareholders. Interestingly, the exact opposite is observed for European countries. These results have previously not been reported as they are not observable without the kind of network analysis developed here.Comment: 24 pages, 12 figures, 2nd version (text made more concise and readable, results unchanged

    Kernel Dependency Estimation

    Get PDF
    We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. Output kernels also make it possible to encode prior information and/or invariances in the loss function in an elegant way. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images

    Assessment on experimental bacterial biofilms and in clinical practice of the efficacy of sampling solutions for microbiological testing of endoscopes

    Get PDF
    International audienceOpinions differ on the value of microbiological testing of endoscopes, which varies according to the technique used. We compared the efficacy on bacterial biofilms of sampling solutions used for the surveillance of the contamination of endoscope channels. To compare efficacy, we used an experimental model of a 48-h Pseudomonas biofilm grown on endoscope internal tubing. Sampling of this experimental biofilm was performed with a Tween 80-lecithin-based solution, saline, and sterile water. We also performed a randomized prospective study during routine clinical practice in our hospital sampling randomly with two different solutions the endoscopes after reprocessing. Biofilm recovery expressed as a logarithmic ratio of bacteria recovered on bacteria initially present in biofilm was significantly more effective with the Tween 80-lecithin-based solution than with saline solution (P = 0.002) and sterile water (P = 0.002). There was no significant difference between saline and sterile water. In the randomized clinical study, the rates of endoscopes that were contaminated with the Tween 80-lecithin-based sampling solution and the saline were 8/25 and 1/25, respectively (P = 0.02), and the mean numbers of bacteria recovered were 281 and 19 CFU/100 ml (P = 0.001), respectively. In conclusion, the efficiency and therefore the value of the monitoring of endoscope reprocessing by microbiological cultures is dependent on the sampling solutions used. A sampling solution with a tensioactive action is more efficient than saline in detecting biofilm contamination of endoscopes

    Confidential Boosting with Random Linear Classifiers for Outsourced User-generated Data

    Full text link
    User-generated data is crucial to predictive modeling in many applications. With a web/mobile/wearable interface, a data owner can continuously record data generated by distributed users and build various predictive models from the data to improve their operations, services, and revenue. Due to the large size and evolving nature of users data, data owners may rely on public cloud service providers (Cloud) for storage and computation scalability. Exposing sensitive user-generated data and advanced analytic models to Cloud raises privacy concerns. We present a confidential learning framework, SecureBoost, for data owners that want to learn predictive models from aggregated user-generated data but offload the storage and computational burden to Cloud without having to worry about protecting the sensitive data. SecureBoost allows users to submit encrypted or randomly masked data to designated Cloud directly. Our framework utilizes random linear classifiers (RLCs) as the base classifiers in the boosting framework to dramatically simplify the design of the proposed confidential boosting protocols, yet still preserve the model quality. A Cryptographic Service Provider (CSP) is used to assist the Cloud's processing, reducing the complexity of the protocol constructions. We present two constructions of SecureBoost: HE+GC and SecSh+GC, using combinations of homomorphic encryption, garbled circuits, and random masking to achieve both security and efficiency. For a boosted model, Cloud learns only the RLCs and the CSP learns only the weights of the RLCs. Finally, the data owner collects the two parts to get the complete model. We conduct extensive experiments to understand the quality of the RLC-based boosting and the cost distribution of the constructions. Our results show that SecureBoost can efficiently learn high-quality boosting models from protected user-generated data

    Implicitly Constrained Semi-Supervised Least Squares Classification

    Full text link
    We introduce a novel semi-supervised version of the least squares classifier. This implicitly constrained least squares (ICLS) classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, our approach does not introduce explicit additional assumptions into the objective function, but leverages implicit assumptions already present in the choice of the supervised least squares classifier. We show this approach can be formulated as a quadratic programming problem and its solution can be found using a simple gradient descent procedure. We prove that, in a certain way, our method never leads to performance worse than the supervised classifier. Experimental results corroborate this theoretical result in the multidimensional case on benchmark datasets, also in terms of the error rate.Comment: 12 pages, 2 figures, 1 table. The Fourteenth International Symposium on Intelligent Data Analysis (2015), Saint-Etienne, Franc
    corecore